Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Group Into Collection (Operator Toolbox)

Synopsis

This operator generates a collection of ExampleSets out of a single ExampleSet by performing a group by operation on the input ExampleSet. The "group by attribute" parameter can be specified. Each ExampleSet of the collection will contain all examples of one unique instance of the ''group by attribute'' parameter. The collection can be processed further by using operators that take collections as input, such as the 'Loop Collection' operator.

Input

  • exa (Data Table)

    The input ExampleSet which should be grouped into a collection.

Output

  • col (Collection)

    The resulting collection of ExampleSets.

  • org (Data Table)

    The original ExampleSet.

Parameters

  • group_by_attribute The name of the group by attribute. The resulting collection will contain one ExampleSet for each unique instance of the attribute. Range:
  • sorting_order The sorting order for the resulting collection.
    • none: The collection is not ordered in any specific way.
    • alphabetical: The collection is ordered due to the natural order of the string representation of the values of the ''group by attribute'' parameter. In short 0-9A-Za-z. Be aware that strings are compared character-by-character, e.g. ''A2'' comes before ''A11''.
    • numerical: The collection is ordered due to the numerical values of the group by attribute in the original ExampleSet. Be aware that the group by attribute has to be a numerical one. A user error is thrown if otherwise.
    • occurrence: The collection is ordered due to the occurrences of the values of the group by attribute in the original ExampleSet.
    Range:

Tutorial Processes

Grouping an ExampleSet into a collection

In this tutorial we organize the Golf data set into a collection by grouping by the attribute ''Outlook''. For each ExampleSet in the collection the two examples with the highest temperatures are filtered. Both the collection and the filtered ExampleSet are delivered to the results panel.

Example Demonstration of the different sorting orders

In this tutorial we use the Group Into collection operator on the Titanic data set from the Samples folder. The operator is applied on the ExampleSet using the 'Age' attribute in four different ways. Each way of sorting is used once. The differences are explained in the comments of the process.